Proposing Multimodal Integration Model Using LSTM and Autoencoder

نویسندگان

Wataru Noguchi

Hiroyuki Iizuka

Masahito Yamamoto

چکیده

We propose an architecture of neural network that can learn and integrate sequential multimodal information using Long Short Term Memory. Our model consists of encoder and decoder LSTMs and multimodal autoencoder. For integrating sequential multimodal information, firstly, the encoder LSTM encodes a sequential input to a fixed range feature vector for each modality. Secondly, the multimodal autoencoder integrates the feature vectors from each modality and generate a fused feature vector which contains sequential multimodal information in a mixed form. The original feature vectors from each modality are re-generated from the fused feature vector in the multimodal autoencoder. The decoder LSTM decodes the sequential inputs from the regenerated feature vector. Our model is trained with the visual and motion sequences of humans and is tested by recall tasks. The experimental results show that our model can learn and remember the sequential multimodal inputs and decrease the ambiguity generated at the learning stage of LSTMs using integrated multimodal information. Our model can also recall the visual sequences from the only motion sequences and vice versa.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimodal Emotion Recognition Using Deep Neural Networks

The change of emotions is a temporal dependent process. In this paper, a Bimodal-LSTM model is introduced to take temporal information into account for emotion recognition with multimodal signals. We extend the implementation of denoising autoencoders and adopt the Bimodal Deep Denoising AutoEncoder modal. Both models are evaluated on a public dataset, SEED, using EEG features and eye movement ...

متن کامل

Speech dereverberation using long short-term memory

Recently, neural networks have been used for not only phone recognition but also denoising and dereverberation. However, the conventional denoising deep autoencoder (DAE) based on the feed-forward structure is not capable of handling very long speech frames of reverberation. LSTM can be effectively trained to reduce the average error between the enhanced signal and the original clean signal by ...

متن کامل

Arbitrary Discrete Sequence Anomaly Detection with Zero Boundary LSTM

We propose a simple mathematical definition and new neural architecture for finding anomalies within discrete sequence datasets. Our model comprises of a modified LSTM autoencoder and an array of One-Class SVMs. The LSTM takes in elements from a sequence and creates context vectors that are used to predict the probability distribution of the following element. These context vectors are then use...

متن کامل

Google's Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based Autoencoders

A neural network model that significant improves unitselection-based Text-To-Speech synthesis is presented. The model employs a sequence-to-sequence LSTM-based autoencoder that compresses the acoustic and linguistic features of each unit to a fixed-size vector referred to as an embedding. Unit-selection is facilitated by formulating the target cost as an L2 distance in the embedding space. In o...

متن کامل

Multimodal Transportation p-hub Location Routing Problem with Simultaneous Pick-ups and Deliveries

Centralizing and using proper transportation facilities cut down costs and traffic. Hub facilities concentrate on flows to cause economic advantage of scale and multimodal transportation helps use the advantage of another transporter. A distinctive feature of this paper is proposing a new mathematical formulation for a three-stage p-hub location routing problem with simultaneous pick-ups and de...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

ICST Trans. Security Safety

دوره 3 شماره

صفحات -

تاریخ انتشار 2015

Proposing Multimodal Integration Model Using LSTM and Autoencoder

نویسندگان

چکیده

منابع مشابه

Multimodal Emotion Recognition Using Deep Neural Networks

Speech dereverberation using long short-term memory

Arbitrary Discrete Sequence Anomaly Detection with Zero Boundary LSTM

Google's Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based Autoencoders

Multimodal Transportation p-hub Location Routing Problem with Simultaneous Pick-ups and Deliveries

عنوان ژورنال:

اشتراک گذاری